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Abstract. Numerous dimensionality reduction problems in data analysis involve the recovery of low- 
dimensional models or the learning of manifolds underlying sets of data. Many manifold learning methods 
require the estimation of the tangent space of the manifold at a point from locally available data samples. 
Local sampling conditions such as (i) the size of the neighborhood (sampling width) and (ii) the number 
of samples in the neighborhood (sampling density) affect the performance of learning algorithms. In this 
work, we propose a theoretical analysis of local sampling conditions for the estimation of the tangent space 
at a point P lying on a m-dimcnsional Ricmannian manifold S in W 1 . Assuming a smooth embedding of 
S in R™, we estimate the tangent space TpS by performing a Principal Component Analysis (PCA) on 
points sampled from the neighborhood of P on S. Our analysis explicitly takes into account the second 
order properties of the manifold at P, namely the principal curvatures as well as the higher order terms. 
We consider a random sampling framework and leverage recent results from random matrix theory to de- 
rive conditions on the sampling width and the local sampling density for an accurate estimation of tangent 
subspaces. We measure the estimation accuracy by the angle between the estimated tangent space TpS and 
the true tangent space TpS and we give conditions for this angle to be bounded with high probability. In 
particular, we observe that the local sampling conditions are highly dependent on the correlation between 
the components in the second-order local approximation of the manifold. We finally provide numerical 
simulations to validate our theoretical findings. 



1. Introduction 

A data set that resides in a high-dimensional ambient space and that is locally homeomorphic to a lower- 
dimensional Euclidean space constitutes a manifold. For example, a set of signals that is representable by a 
parametric model, such as paramctrizable visual signals or acoustic signals form a manifold. Data manifolds 
are however rarely given in an explicit form. The recovery of low-dimensional structures underlying a set of 
data, also known as manifold learning, has thus been a popular research problem in the recent years. This 
is typically achieved by constructing a mapping from the original data in the high-dimensional space to a 
space of much lower dimension. Importantly, most manifold learning methods rely on the assumption that 
the data has a locally linear structure. Of course, for such an assumption to be valid at some reference point 
on the manifold, one has to take into account (i) the size of the neighborhood from which the samples are 
chosen and also, (ii) the number of neighborhood points. For instance, if the manifold is a linear subspace, 
then the neighborhood can be chosen to be arbitrarily large and the number of samples needs to be simply 
greater than the dimension of the manifold. However, most manifolds are typically nonlinear, which prevents 
the selection of an arbitrarily large neighborhood size. Hence, one might expect the existence of an upper 
bound on the neighborhood size. Furthermore, the number of necessary samples is likely to vary according 
to the local characteristics of the manifold. 

The purpose of this work is to analyze the relation between the sampling conditions of a manifold and 
the validity of the local linearity assumption of the data sampled from the manifold. We characterize the 
local linearity of the data with the accuracy of the tangent space estimation. We do a local analysis around 
a point P on a manifold S. We examine the deviation between the tangent space TpS estimated using 
manifold samples in a neighborhood of P, and the true tangent space TpS at P. This deviation is related to 
the local geometric properties of the manifold around P and the local sampling conditions. In this paper, S 
is assumed to be an m-dimensional Riemannian manifold in E™ that can be locally represented with smooth 
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(C r , r > 2) mappings, where m < n. We consider a random sampling where the orthogonal projections of the 
samples to TpS in a neighborhood of P is uniform. We derive bounds on the size of the neighborhood and 
on the number of samples such that the deviation (i.e., the angle) between TpS and TpS is upper bounded 
with high probability. In particular, our analysis captures the dependency of the sampling conditions on 
the second-order properties of the manifold, namely the local curvature of S at P, and on the higher-order 
terms. Thus, broadly speaking, this work consists of a theoretical analysis of the manifold sampling problem 
that relates the local sampling conditions to the accuracy of the local linearity assumption. This paper 
builds on our preliminary work [1] , where the sampling of manifolds represented with quadratic embeddings 
is examined, and extends the analysis to arbitrary smooth embeddings. We envisage two main applications 
where our study can prove to be useful. Firstly, our results can be used for deducing performance guarantees 
or for determining a good local subset of data samples that gives an accurate estimation of the tangent space 
in manifold learning applications. Secondly, our analysis can also be used in manifold sampling applications, 
i.e., for choosing samples from a manifold with a known parametric model. The discretization of a manifold 
can be achieved in various ways depending on the target application (see for example [2]); however, in certain 
cases one may want to sample the manifold in such a way that the local linearity of the data is preserved 
and the tangent space can be correctly recovered from data samples. 

The manifold learning problem has been largely studied and we provide now a brief overview of the lit- 
erature, with a special focus on locally linear approximation methods. The manifold structure of data can 
be retrieved in various ways, from a global parameterization based on geodesic distances as in ISOMAP [3], 
or via locally linear representations as in LLE [3] and Hessian Eigenmaps [5] . The LLE algorithm considers 
the locally linear structure of the manifold, where each data sample is approximated by a weighted linear 
combination of its nearest neighbors. Then, the key idea in computing a mapping of the data is the preser- 
vation of these weights in the embedded low-dimensional space. Moreover, there are other algorithms such 
as [6j which employ the locally linearity of the data by expressing the tangent plane as a linear combination 
of the manifold samples in a local neighborhood. The Hessian Eigenmaps algorithm is similar to LLE in 
the sense that it is based on locally linear approximations of the manifold. However, it has been seen to be 
more robust than LLE as it also takes more detailed geometric characteristics of the manifold into account. 
With similar ideas, an adaptive manifold learning algorithm is presented in [7], where the authors propose 
an adaptive local neighborhood size selection strategy. 

Among the dimensionality reduction methods, one can find many examples of algorithms such as [5] , [H] , 
0; [IH] j which apply a local Principal Component Analysis (PCA) for the computation of the tangent space 
of the manifold like we do in this work. In other words, the tangent space is estimated by computing the 
eigenvectors of the covariance of the data matrix, where the data samples come from a set of neighbor points 
on the manifold. This step can be seen as a noisy PCA analysis, where the data noise is caused by the 
nonlinear geometry of the embedding, i.e., the deviation of the manifold samples from the tangent space 
as a result of nonzero curvature. The performance of Singular Value Decomposition (SVD) or PCA under 
noise is a well-studied topic. There are many results in the literature that examine the perturbation on the 
singular vectors of a data matrix in the presence of noise. The Davis-Kahan theorem [TT] is a classical result 
that examines how much the subspace spanned by the eigenvectors of a Hermitian matrix is rotated upon 
the perturbation of the matrix. The Wedin theorem [12] generalizes the analysis to non-Hermitian operators 
by bounding the angle between the estimated and true singular vectors in terms of the separation between 
the eigenvalues of the data matrix. A recent result in 13J addresses the singular vector estimation problem 
under assumptions of random perturbation noise and low-rank matrix. Finally, the work in [14] examines the 
bias of random measurement error on PCA and relates the bias to the SNR of the observed data. However, 
above studies do not involve the geometric structure of the data. There are also many studies that analyze 
the performance of PCA for a set of data generated by a specific model. For instance, the works such as 
[15] , \Tjo\ , [T7] address the analysis of the eigenvalues and eigenvectors of the covariance matrix of some data 
conforming to a multivariate normal distribution. These works however do not specifically consider any 
manifold data model cither. 

Only a few recent works have studied the relation between the PCA performance and the data geometry. 
The work in |18j presents an interesting study that generalizes the idea of diffusion maps in dimensionality 
reduction |19j to vector diffusion maps, where the new vector diffusion distance involves the similarity between 
the tangent spaces on different manifold points. In their analysis, the authors also provide a soft bound for 
the deviation of the locally estimated tangent space at a reference point (using local PCA) from the true 
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tangent space, for a probabilistic sampling of the manifold. In particular, it shows that, when the size e of the 
local area for tangent estimation is set to e = 0(K~ m + 2 ) with K being the number of samples on the whole 
manifold, the deviation between the estimated and the true tangent space is typically of 0(e 3 / 2 ). This work 
however considers a global sampling from a compact manifold while we focus on the local manifold geometry. 
Finally, the accuracy of tangent space estimation from noisy manifold samples is analyzed in a work parallel 
to ours |20| . The manifold is assumed to be embedded with exactly quadratic forms (similarly to pQ) and 
the data consists of manifold samples corrupted with Gaussian noise. The work optimizes the number of 
samples (from a fixed sets of candidates) that is used for estimating the tangent space by considering the 
effect of noise and curvature on the accuracy of estimation. In particular, the optimal number of samples is 
selected as a trade-off between the error due to noise and the error caused by the curvature that respectively 
decreases and increases as the number of samples grows. This study however focuses on manifolds that are 
embedded with exactly quadratic forms and characterized with a subset of noisy samples given a priori. On 
the contrary, we are interested in more generic embeddings with arbitrary smooth functions and we aim at 
characterizing a sampling strategy in terms of the sampling width and density for noiseless manifold samples. 

In our paper, we propose to characterize the local linearity of a manifold by studying the accuracy of the 
tangent space estimation from a local set of randomly selected manifold samples. We propose the following 
contributions. First, we determine a suitable upper bound on the neighborhood size within which random 
manifold sampling can be done. In the derivation of this bound, we consider the asymptotic case K — > oo so 
that the neighborhood size purely depends on the manifold geometry. In particular, our analysis depends on 
(i) the maximum principal curvature of the manifold and (ii) the deviation of the manifold from its second- 
order approximation. Our main results are stated precisely in Lemma[2]for the quadratic embedding case and 
in Lemma [4] for the more general smooth embedding case. They show the dependency of the neighborhood 
size on the correlation between the components in the second-order local approximation of the manifold. 
Second, we compute a bound on the minimum number of samples for accurate tangent space estimation, 
given that the sampling is performed randomly in a neighborhood whose size conforms with Lemmas [2] and 
[4] We utilize recent results from random matrix theory [21], [22] in our analysis. We state the precise 
expression for this bound on the number of samples in Lemma [3] Combining the two above results, we give 
a complete characterization of the local sampling conditions in the form of main theorems, namely Theorem 
[T]for the quadratic embedding case and Theorem [2] for the more general smooth embedding case. We finally 
discuss potential applications of the new theoretical results proposed in this paper, in respectively manifold 
learning and manifold sampling problems. 

The rest of the paper is organized as follows. In Section [2] we first define the notations used in the paper 
and then give a formal statement of the problem along with the assumptions made. For ease of readability, 
the main results of the paper are presented in Section [3] We then present in Section [4] a detailed analysis 
of the local sampling conditions for tangent space estimation at a reference point P on S. In particular, 
Sections 4.2 and 4.3 contain the sampling analysis for the case when the embedding is assumed to be exactly 
quadratic at P. In Sections |4.4| and 4.5 we analyze the more general scenario of m-dimensional smooth 
embeddings in R". Section [5] presents simulation results on synthetically generated smooth manifolds. In 
Section [6] we provide a discussion regarding the usage of our theoretical results in practical applications. 
Finally, in Section [7] we provide concluding remarks along with possible directions for future work. 



2. Problem Formulation 

In this section we first define the notations used in the paper. We then define the our manifold approximation 
framework. We finally state formally the problem of tangent space estimation that is studied in this paper. 

2.1. Notations. Let S C R" be a manifold and P € S be a reference point on the manifold where the local 
sampling analysis is performed. We denote the dimension of the manifold S by m. The tangent space at 
P E S is represented by TpS and TpS 1 - is used to denote the orthogonal complement of TpS in R™. The 
notation C is used for denoting r times continuous differentiability 

We denote the ^ p -norm of a vector x € R™, 1 < p < 00, by |j x \\ p := (X)T=i Nd 23 ) 1 ^ an d its ^-norm by 
|j x ||oo := maxi \x%\. The inner product between x,y £ R™ is denoted by (x,y) := x T y. Furthermore, we 
represent a canonical vector in R™ by e~j for j = 1, . . . , n, where e, has a 1 at the j th position and at all 
other positions. 
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Figure 1. The manifold S and the e-neighborhood of a manifold point P £ S. 

Given a matrix X £ W xq , we have by its (reduced) singular value decomposition (SVD) [53] the factoriza- 
tion X = UY,V T where U £ W xs and V £ M. qxs are the singular vector matrices with orthonormal columns. 
The dimension s < min(p, q) corresponds to the rank of X. The matrix £ = diag(<7i(X), . . . ,a s (X)) is a 
diagonal matrix where cri(X) > ■ ■ ■ > u s {X) > are the singular values of X. We denote the Frobenius 
norm of X (the i? 2 - n orm of its vector of singular values) by || X \\f'.= (Tt(X t X)) 1 / 2 and its operator norm 
(the largest singular value) by || X ||. For any square matrix X £ IR pxp , we denote the trace by Tr(X) and 
the determinant by det(AT). 

For a symmetric matrix X £ ]R pxp , X = X T we have the eigenvalue decomposition X = UAU T . Here 
A = diag(Ai(X), . . . , X P (X)) denotes the eigenvalue matrix with Xi(X) > ■ ■ ■ > X P {X) and U £ R pxp is a 
unitary matrix so that UU T = U T U = I. If X is symmetric and positive semidefinite we then have \i(X) > 
for i = 1, . . . ,p. We denote the spectral radius of a symmetric matrix X by p(X) = maxi(\Xi(X)\). 

Throughout the paper, E[-] is used for denoting the expectation and P(-) for denoting the probability. 

2.2. Framework. We consider an m-dimensional submanifold S of M. n with a smooth embedding in R™, 
n > m + 1. Let Af e (P) denote a e-neighbourhood of P for some e > 0, where 

K(P) = {M £ S : || M - P || a < e} . 

The neighborhood of P on S is illustrated in Fig. [T] 

In this work, as we represent points in Af e (P) via tangent space parameterization using local functions 
/; : TpS —> R, we are interested in the mapping that orthogonally projects the manifold points in a 
neighborhood of P to TpS. In [24 j . Niyogi et al. provide a characterization of the neighborhood of P within 
which this mapping is one-to-one, through the condition number of the manifold. Therefore, there exists an 
e such that all points M £ N e {P) can be uniquely represented in the form 

[x T /i(5).../„_ m (x)] T . (2.1) 

Here x — [x\ . . . x m ] T denotes the coordinates of the orthogonal projection of a point on TpS. Note that, 



in (2.1 1, the coordinates are with respect to the point P that is the reference point, i.e., the local origin. 



Furthermore, the tangent space TpS at P can be represented as 

T P S = span{ei, . . . ,e m } , 

where Bj £ K™ denote the canonical vectors. 

Now, we further assume the smoothness of the embedding to be C r , r > 2, implying that each 

/; : TpS — > R, I = 1, . . . , n — m, 

is a C r -smooth function in the variables (x\, . . . , x m ). Since V//(0) = we have by the Taylor expansion of 
fi around the origin (i.e., P) the following identity: 

fi(x) = f q ,i(x) + 0(\\xf 2 ); l = l,...,n-m (2.2) 

where f qi i is a quadratic form. As a special case, we have a quadratic embedding at P when each /; is an 
exact quadratic form, i.e., 

//(•) = /?,«(•); l = l,...,n-m. 
Consider the Hessian of // at the local origin P, which is given as 

V 2 fi(0) = ViAiV l T , 
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Figure 2. The true tangent space TpS and the estimated tangent space TpS at point P. 



where A; = diag(/C; i i, K-1.2, • ■ ■ , fCi,m)- Here /C;,i, fCi,2, • ■ ■ , lCi,m are the principal curvatures of the hypersur- 
face 

S t = {[x x ... x m Mxx, . . .,x m )\ : [xi ... x m f e T P S} C « m+1 
defined by We then define the maximum principal curvature at P as 

ICmax '■= JCv ,j' where (/',/) = argmax|/Cj i:) |. 

1,0 

We consider that the tangent space can be estimated from sample points in Af e (P) through a PCA 
decomposition. More precisely, let us consider K points {P{\ i=1 sampled from M e {P). Let denote the 

local covariance matrix where 

M(K) = E = UAJjT - 

i=l 

The matrices U and A £ K™ represent the eigenvector and eigenvalue matrices respectively of M^ K > where 



U 



[Ui 



A = diag(Ai, . . . A I( 



.A„), 



with the ordering Ai > • • • > A TO > • • • > A„. The optimal m-dimensional linear subspace at P in the least 
squares sense is then given by the span of the m largest eigenvectors of M^ K \ i.e., 

T P S := spanjui, . . . ,u m } . 

The tangent space TpS and its estimation TpS are illustrated in Fig. [2] Finally, we characterize the accuracy 
of our estimation with the angle between the estimated and the true tangent spaces. The notion of 'angle' 
between two linear subspaces as defined in [25] is given in Definition [TJ 

Definition 1. The angle /LA, B between two subspaces A — span{ai, . . . , a p } and B = span{bi, . . . , b q } 
of a Euclidean space R n , where a,i 's and bi 's are orthonormal vectors, is defined as 

cos 2 ZA,B := det(W T W), 

where [W T ]i_k ■—< (ii 7 bk > is a p x q matrix, with 1 < p < q < 00. 

Observe that the definition can be applied to subspaces that are not necessarily of the same dimension. 
Geometrically speaking, 

Vi 



cos ZA, B := 



V 2 



where V% is the volume of the p-dimensional parallelepiped spanned by the projection of {ai, . . . ,a p } on 
B and V2 is the volume of the p-dimensional parallelepiped spanned by {a~i, . . . ,a p }. Therefore, in order 
to compare TpS and TpS, one could also consider the distance between the respective projection matrices 
EE T and J/( m )[/(™) T through 



EE 1 



(2.3) 
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where E — [e\ . . . e m ] and I7( m ) = [u\ . . . u m }. Note that an upper bound on \ZTpS, TpS\ implies a corre- 
sponding upper bound on || EE T -L7(™)[/M Finally, our choice of using Definition 1 for estimating 
angles is motivated by the measure of the geometric deviation of TpS from TpS. However, one could also 
work with the error criteria of Eq. (2.3) with no change in the analysis and sampling conditions^] 

2.3. Problem statement. Given the above settings, we want to describe the conditions on the manifold 
samples {Pi}^ = i such that for a given error bound <f> € (0, on the tangent space estimation, 

\ZTpS,T P S\ < < | 

is ensured. In particular, for a given error bound <f>, we would like to answer the following questions: 

• Question 1: What would be a suitable upper bound on the sampling distance; i.e., the distance of Pi from 
PI In particular, for large embeddding dimensions n, what is the nature of the dependency of this bound 
on n, m and IC max ? 

• Question 2: Given that the points {Pi}i—i are sampled such that the sampling distance satisfies the 
sampling distance bound, what would be a suitable lower bound on the sampling density K? In particular, 
for large embeddding dimensions n, what is the nature of the dependency of this bound on n, m and JC max ? 



In order to answer the above questions, we consider a random sampling framework where we assume 
that the coordinates of the orthogonal projections of manifold samples on TpS are distributed uniformly 
in the region [— l>, v] m £ TpS. In other words, denoting the coordinates of the projection of Pi on TpS by 
Xi = [x^ . . . Xm], we assume that 



x 



(i) 
3 



U\—v,v\ i.i.d. i — 1, . . . , K; j = 1, 



where 11 denotes the uniform distribution. Therefore, we characterize the sampling distance in Question 1 
by the parameter v, which we shall refer to as the sampling width in our analysisn 

3. Main results 

We summarize in this section the main results of the paper. We provide sampling conditions for tangent 
space estimation in two different cases; namely, quadratic embeddings and generic smooth embeddings. 

3.1. Quadratic embedding at P. We first consider as a special case the scenario where the manifold S 
has a quadratic embedding at P in K n . We present the main sampling theorem in the form of Theorem [T] 
below. The main purpose of this result is to gain some intuition about the sampling conditions when the 
local functions fi 's involved in the tangent space parametrization have a purely quadratic form and they are 
not 'perturbed' by higher-order terms. We refer the reader to Section 4.2 for details regarding the proof and 
for a more rigorous analysis. 

Theorem 1 (Quadratic manifold sampling). Consider {Pi}^ =1 to be formed by sampling uniformly at random 
from the region \—v,v\ m around P in TpS, i.e., 

Xj ~ 14[— v, v] i.i.d., i = 1, . . . , K, j = 1, . . . , m. 

Let D € ^n-mxn-m denote the local correlation matrix for the mappings {/ 9 .i}" =1 m such that 

[D]i >k = E[f qt i(x)f qtk (x)}; l,k=l,...,n-m. 

We then have the following sufficient sampling conditions depending on the structure of D, to guarantee a 
bound on \Z.TpS,TpS\. 

(1) If D is diagonal, i.e., the random variables {/(j,;(^)}™ =1 m o,re uncorrelated, then for any r £ (0,1), the 
choices 

v = 0(m~ 1 \JC max \~ 1 ) and K — 0(mnr~ 2 log n), as n — > oo 



ensure that \ZTpS,T P S\ < cos" 1 ^/(l - r 2 ) m holds w.h.p. 

-^This is explained in more detail in Lemma ^ and Remark ^ 

2 See Section ^ for a discussion on how the bound on v relates to the distance in the ambient space. 
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(2) If D is dense, i.e., the random variables {fq,i(x)}r—i are correlated, then for any t G (0, 1), the choices 

v = 0(n~ 1 ' 2 m~ 1 \IC max \~ 1 ) and K = 0(T~ 2 m 2 logn), as n — > oo 

ensure that \ZTpS,TpS\ < cos -1 ^/(l — r 2 ) m holds w.h.p. 

Interpretation of Theorem [l| The sampling conditions depend considerably on the correlation between 
the components in the second-order approximation of the manifold at P. This correlation is 

represented by the correlation matrix D. We observe that, in the scenario where {fq^Z-T are uncorrelated, 
i.e., D is diagonaQ the bound on the sampling width is independent of the ambient space dimension n and 
depends only on m and JC max . On the other hand, when {fq,i}?—i are correlated, i.e., D is dense, we see 
that the bound on v behaves as 0(n~ 1 / 2 m~ 1 \lC max \~ 1 ), indicating that the sampling region needs to shrink 
with the increase in ambient space dimension. Of course, the case where D is dense is general and hence the 
corresponding bounds for v and K hold even if D is in fact diagonal. The point here is that if D is diagonal, 
then one can afford to sample the manifold from a larger neighborhood. Furthermore, irrespective of the 
structure of the correlation matrix D, we observe that the bound on the sampling width depends linearly on 
the reciprocal |/C maa; | _1 of the maximum curvature, which is also intuitively expected. 

The nature of the correlation between the curvature components also affects the bound on the sampling 
density K, with the bound being less restrictive when {/q,;}™^" 1 are correlated compared to the case when 
{fq,iY?L™ are uncorrelated. The bound is logarithmic in n in the first case, and loglinear in n in the latter 
one. This makes sense, since reducing the sampling width v causes Af e (P) to be more linear and hence loosens 
the restrictions on the number of samples required to achieve a given approximation bound on \ZTpS, TpS\. 

Lastly, we remark that the approximation error term r arises on account of finite sampling and can be 
interpreted as the variance error. In particular, provided that the sampling width v is chosen to satisfy the 
appropriate bound, then we have that \ZTpS,TpS\ — > in the limit where K — > oo. 

3.2. Smooth embedding of S in W 1 . We now present our main sampling theorem for the general case of 
smooth embeddings of S in W 1 in the form of Theorem [2] For details regarding the proof and for a rigorous 
analysis we refer the reader to Section |4.4| 

Theorem 2 (Smooth manifold sampling). Consider {Pi}f = i to be formed by sampling uniformly at random 
from the region [~v,v] m in TpS, i.e., 

Xj ~ U[— v, v\ i-i.d., i = 1, . . . , K, j = 1, . . . , m. 

Let D € jjra— mxra— m denote the local correlation matrix for the mappings {/ 9 ,;}™ =1 m , such that 

[D] Lk = E[f qt i(x)f q! k(x)], l,k = l,...,n-m. 

We then have the following sufficient sampling conditions depending on the structure of D, to guarantee a 
bound on \ZTpS,TpS\. 

(1) If D is diagonal, i.e., the random variables {/g,;(a;)}™ =1 m Are uncorrelated, then the choices 

v = 0(n- 1/3 m- 5/6 \K. max \~ 1/3 ) and K = O(nlogn), as n -> oo 

ensure that \zf P S,T P S\ < cos" 1 ^(1 - 0{n- 1 /3 r rv'/3\lC max \- 4 / :i )) m holds w.h.p. 

(2) If D is dense, i.e., the random variables {/<?,; (5;)}™ =1 m are correlated, then for any r G (0, 1), the choices 

v = 0(n~ 1 / 2 m~ 1 \)C max \~ 1 ) and K = 0(T~ 2 m 2 logn), as n — > oo 
ensure that \ZTpS,TpS\ < cos -1 y(i — t 2 — 0(n~ l m\lC max \~' l ))' m ' holds w.h.p. 
Interpretation of Theorem [2j As the manifold S is now smoothly embedded in M. n , the local functions 



fi's involved in the tangent space parametrization are arbitrary smooth functions of the form (2.2). Hence, 
in this case, the deviation of the manifold from the tangent space, which can be regarded as geometric 
noise, arises due to two factors: (i) the curvature components associated with {tq^}™!™ an d (ii) an extra 
component due to the higher-order terms in the Taylor series of // (which are 0(|| x Hf))- Note that the 
geometric noise in a quadratic manifold embedding is only due to the curvature components. We see that 
when the correlation matrix D is diagonal, on account of the extra noise arising due to (ii), the bound on 



^More details about the case of D being diagonal are given in Section 4.2 Remark^ 
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the sampling width becomes stricter compared to its counterpart in Theorem [T] and now depends on n as 
OirT 1 ^). This is still better than the bound for the general case when D is dense, for which we see the 
same order of dependency, i.e., 0(n -1 / 2 ), as for the corresponding case in Theorem[lj 

Observe that the bound on the sampling density K has a similar order of dependency on n, m, and |/C mQX | 
as in Theorem [I] (except for the factor m when D is diagonal). Again, a smaller sampling width implies that 
A4(P) is closer to being linear and hence fewer samples should suffice to estimate TpS. 

In the bounds on \ATpS, TpS\, the error term represented by r corresponds to the variance due to finite 
sampling. The second error term arises on account of the higher-order terms in the Taylor expansion of 
and can be interpreted as the bias term due to a nonzero sampling width v. This bias goes to zero 
as v — >■ 0. In particular, for a fixed v that is chosen to satisfy the appropriate bound, we have that 
| Z Tp S, Tp S | approaches a constant bias error term with the variance error vanishing in the limit where 
K — > oo. Interestingly, we observe that when D is diagonal, the effect of the variance term due to finite 
sampling is negligible in comparison with the bias term. Table [I] summarizes the main sampling conditions 
proposed in this paper. 



Correlation of {f q ,i}™_™ 


Smooth embedding 


Quadratic embedding 


Uncorrelated 


» = 0{n-^m-si 6 \K max \-"*) 
K = 0(n log n) 


v = 0(m~ 1 \IC max \~ 1 ) 
K = 0(mnT~ 2 log n) 


Correlated 


v = 0(n- 1 ^ 2 m- 1 \IC max \- 1 ) 
K = 0(T~ 2 m 2 logn) 


^ = 0(n- 1 / 2 m- 1 |JC ma!c |- 1 ) 
K = 0(j- 2 m 2 logn) 



TABLE 1. Summary of the sampling conditions at a point P on an m-dimensional manifold S in 
R n (m < n) such that \Zf P S,T P S\ < cos -1 - r 2 ) m for some r G (0,1). The two following 
cases are compared: (i) S has a smooth embedding in R n , (ii) S has a quadratic form in K" w.r.t. 
the point P. 



4. Analysis 

We now present a detailed analysis of our local sampling results for smooth m-dimensional Riemannian 
manifolds in R™. To begin with, we first define the framework for our analysis by introducing the tangent 
space parameterization for points in Af e (P). 

4.1. Framework for tangent space estimation. We discuss first the parametrization that we use in our 
analysis. Let \x\ ... x m f\{x) . . . f n -m(x)} T be a point in Af e (P) and let x = [x\ ... x m ] T denote its 
orthogonal projection on TpS. The region Af e (P) can be represented in terms of (n — m) hypersurfaces of 
dimension m in M m+1 , where the I th hypersurface is given by 

Si = {[ Xl ... x m fi{x u ...,x m )] : [ Xl ...x m ] T eT P S}cR m+1 . 

Due to the assumption that the embedding is C r , where r > 2, the functions // have the following form 
V/ = 1, . . . , n — m 

fi(x) = /,(6) + Vfi(0) T x + ^x T V 2 fi(0)x + i?z(6), 
= + T x + ^VikiVfx + i?,(£ ; ), 

- m 

= 2 53 >2 IC id)+ R i^' 

3=1 

= \ II x III Ki{x) + Ri(ii) = f q M + Ri{l), 

where ^ € (0,5) depends on x. Here, f q .i denotes the quadratic approximation of and Ri(£,i) — 0(\\ x |||) 
represents the higher-order remainder terms in its Taylor series. The Hessian of /; at the origin is represented 
by V 2 / ; (0), and 

Vi=[v tt i vi, 2 ■■■ %m] mXm , ^ = diag(£i,i,£j,2,...,/Cj )m ) 
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denote respectively the eigenvector and eigenvalue matrices of V 2 /;(0). Geometrically, K-i(x) represents the 
curvature at point P of the geodesic curve on Si from P to [x T f\{x) . . . f n - m (x)] T ' , where 



< n,%3 > ir 
— iTFli2 — ^'J 



Given the above setting, recall from Section 2.3 that 

ICmax — JCi',j' where (/',/):= argmax|/C Zj |. 

1,3 

Here, |/C ma£C | is the largest absolute value of the principal curvatures among the hypersurfaces Si, for / = 
1, . . . , n — m. We remark that the sampling conditions derived throughout our analysis capture the second- 
order properties of the manifold at P in terms of the maximum curvature K, max - Equipped with the above 
parametrization, we can now describe the estimation of the tangent space. Let us consider K points, 



• ' • X m fl( x i) • ■ ■ fn— m(^i)] f 

in j\f E (P). Denoting the coordinates of the orthogonal projection of Pj on TpS as 
for i = 1, . . . , K, we represent the points by the matrix X^ as follows. 



r (9 (0 



„(9lT 



fn—m ' ' ' fn—mip^K) 

where each // has the following form: 

= ^2<x,vi.j> 2 K hj +0{\\x\\l) ; l = l,...,n-m. 

3=1 



(4.1) 



The optimal m-dimensional linear subspace, in the least squares sense, passing through P will be the one 
spanned by the eigenvectors corresponding to the m largest eigenvalues of 

" A (K) B (K)- 

B (K) T £>(*) 



1 t(K) 



M (if) = —XX 
K 



= UAU T , 



where the individual submatrices have the following form. 



K Si( x l 



(0\2 



if Z^i x i 
Si^™) 2 



K Si X l fl{ x i) "'■ fn-m(Xi) 



(9. 



ir Si fn—m( x i) 



and 



(A') 



K Si fl( x i) 



X Si fn—m{ x i)fl( x i) 



Furthermore, 



{/=[«! • • • U m U m+ 1 



2f Si fl{ x i)fn—m{ x i 
K Si fn~m{ x i) 

^ ] and A = diag(Ai, A 2 , . . . , A„) 



are respectively the eigenvector and eigenvalue matrices of ^AX r< ' with U T U = UU T — I n . Assume the 
ordering Ai > • • • > A m > • • • > A„. We then have 

TpS — span{wi, . . . , u m } and TpS — spanjei, . . . , e m }, 
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where {ej}™ =1 denote the first m of the n canonical basis vectors in E™. Now the angle between TpS and 
TpS as per Definition [I] is given by 

cos 2 (ZTpS, T P S) := det(W T W), 
where [W T ]jj =< Ui, &j > for 1 < i, j < m. Let us denote the first m columns of U by [/( m ' where 



Jj(r. 



Ul 

u 2 



Lemma [l] states the condition on || Ui \\f that guarantees a bound on \ZTpS, TpS\. 

Lemma 1. Consider K > m points in Af £ (P) sampled such that \\ U2 \\f< t < 1 for some < r < 1. Then, 



\ZTpS,T P S\ < cos' 1 v/(1-t 2 )™. 
Proof. Clearly U^Ui + t/Jc/2 = I m xm- Let E = [ej ... e m ]. We have 

W T W = {U {m ^ T E){E T U^) = UlU x = I mxm - UlU 2 . 
Denoting the eigenvalues of U2U2 as fi\, . . . , fi m , we observe that 

Tr([/J[/ 2 ) = || U 2 Hi > Umax, 

where /J, max — maxi—i,.. m fii. Using this result in conjunction with Definition [l] we arrive at the following 
inequality. 

rn 

cos 2 (ZT P S, T P S) := det (7 mXm - C/ 2 T C/ 2 ) = - m) > (1 - ^ ma ,) m . 

i=l 

Hence, the following bound on \ZTpS,TpS\ clearly holds if || U2 \\f< t <1. 



cos 2 (ZTpS, T P S) > (1 - t 2 )" 1 ^ \ZTpS,T P S\ < cos" 1 ^(1 - r 2 ) m . 

□ 

Remark 1. We remark here that one can compare the column spaces of E and L^™ 1 ' by also computing the 
difference between their projection matrices, i.e., \\ EE T — U^U^" 1 ) \\p. It is easily verifiable that 

|| EE T - U (m) U (m)T \\ 2 F = 2 || U 2 \\ 2 F ■ (4.2) 

Hence when \\ U% \\f< t < 1, we have || EE T -U^U^ \\ F < V2t. The core of our analysis involves 
deriving sampling conditions which guarantee that \\ U2 \\f *s suitably upper bounded. Hence one can inter- 
changeably use Eq. (4.2 I or the notion of angle in Definition^to compare TpS and TpS with no change in 
the analysis and the sampling conditions derived later on. The only change would be in the expression for 
the error bound where instead of cos" 1 ■JJl — r 2 ) m one would have the error term \/2t. Our choice of using 
Definition\^is purely motivated by our objective of measuring the deviation of TpS from TpS in a geometric 
way. 

Finally, we note that, if the manifold S is a linear subspace of E™, then we have = and = 

implying U2 to be trivially equal to zero. In other words, we have ZTpS, TpS — for any K > m. However, 
when S is a more general manifold, then its nonlinearity manifests itself in the form of error arising due 
to the local mappings {/;}™ =1 m - Hence, in order to obtain a good locally linear approximation of TpS, one 
intuitively expects that the points in Af E (P) are sampled sufficiently close to P. In particular, one might 
wonder how far from P points can be sampled and also how many points need to be sampled in order to 
achieve a good approximation guarantee on \ZTpS,TpS\. We now proceed to rigorously analyze these two 
questions in the following sections. 
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4.2. Accuracy of tangent space estimation for quadratic embeddings. We assume first that AT £ (P) 
is representable in terms of quadratic forms at the reference point P. In other words, for any point 
[xi... x m fi(x) . . . f„- m (x)} in Af £ (P), we have 

fi{x) = f q ,i(x), l = l,...,n-m, 

where fq,i{-) denotes the second order approximation of //(•)■ 

We consider the points {Pi}f = i to be formed by sampling independently and uniformly at random in TpS 
such that 



x 



(i) 
3 



U[—v,v]i.iA., Vi=l,...,K and j = 1, ...,m. 



To begin with, Lemma [2] states precisely the condition on v which guarantees that TpS = TpS in the limit 
where K — > 00. 

Lemma 2. As K -> 00, [M^ K \j -> [M]ij a.s. for every 1 < i,j < n, where 



M 



v 2 j n 

^ J mxm u mx(n-m) 

0(n— m)xm ^(n— m)x (n— m) 



, [£>],,* = E[/,(a;)A(S)] = E[/ 9j/ (s)/ 3 , fe (5)], 



and l,k — 1 1 . . . ,n — m. Furthermore, the following holds. 

I n — rr 



('i^ Let 13 &e diagonal, i.e., let {fq,i}7—i be uncorrelated. Then, if the sampling width satisfies 



v < 



GO 



m(5m + 4)|/C 



2 ' 

maa; | 



ii holds that V(\ZT P S,T P S\ > 0) ^ as K -> 00. 

iet D 6e dense, i.e., let {/ g ,/} ;=1 &e correlated. Then, if the sampling width satisfies 



v < 



GO 



2 ' 

max I 



m(n — m)(5m + 4)|/C 
if /io/ds that V(\Z.f P S, T P S\ > 0) -> as K -> 00. 

The proof of Lemma [2] is presented in Appendix |A.1| The main idea here is to observe that the eigenspace 
corresponding to the eigenvalue ^ is equal to the span of {e"i, . . . , e m }, which is the same as TpS. Hence, 

the condition on the sampling width follows from the requirement that the noiseless spectra associated with 

2 

3-/771 is separated from the noisy spectra associated with D arising on account of the manifold's curvature 
at P. In other words, 

v 2 

Y >P(D) 

where p(D) denotes the spectral radius of D. For the sake of brevity, let us denote the bound on v by 



Abound, quad — 1/ V3RL 

where 

m(5m + 4)\JC max \ 2 f 1, if D is diagonal 

L = and xl = 



180 1 (n — m). if D is dense 

Therefore, inbound, quad depends on the structure of D. 

Remark 2. The case where D is dense is general. Therefore, the derived condition on v can be used even if 
D is actually diagonal. Moreover, if D is diagonal, then we see that the condition on v is considerably less 
restrictive. We note here that {/ij.zlJLi" 1 w ^ typically be correlated if m is fixed and n is allowed to increase 
to a large value. This arises due to the requirement 

^[fq,l{ x )fq,k{x)\ = 0, With I, k = 1, . . . , U - 771, I ^ k, 

where a large value of n and a small value of m result in more equations than degrees of freedom. Hence, 
in order to have D diagonal, the manifold dimension m needs to be sufficiently large. Note that D can also 
be sparsely correlated in which case the sufficient condition on v lies in between the two bounds stated in 
Lemma\M 
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We want now to find a lower bound on the number of samples K which guarantees that the deviation 
\Z.TpS,TpS\ is suitably upper bounded with high probability. Hence, we first derive a bound on K that 
guarantees some tail bounds on the eigenvalues of the submatrices of M^ K \ This bound is precisely stated 
in the following Lemma. 

Lemma 3. Let the sampling width be chosen such that v < inbound. quad — l/v3RL. Let si £ (0, 1), s 2 > e, 
S3 > and < Pi,P2,P3 < 1 denote fixed constants. We define 

K blnd = (fT^)2 lo S ((n-m+ l)/pi) , 

K (2) Rp log((n-m)/p 2 ) 

bound S2RL log( S2 /e) 



Hound = Iog(»/P3), 



where 



Rm =m+\(n- m)mV|/C ma:c | 2 , Rn = \{n- m)m*\K max \\ 



T3 m 2 \K max \ 2 f R(5m + 4) ) 1 3/2 , 

-R(j = — max <^ {n - m), — > , and R B = -m ' Vn - m|K, ma:E |. 

TTierc, /e< K bo und = max{A'^ fld , K^ und , K { b Z J und }. If the number of samples K satisfies K > K bound , then 
the following bounds hold true with probability at least 1 — pi — p 2 — P3: 

. (i;A m (^ (,r) )>»ip 

. («; p(£>W) < s 2 p(£>), 

• (»i»J || flW || < S3. 

The proof of Lemma [3] is presented in Appendix |A.2| The lemma defines a sufficient bound on the 
sampling density, which in turn guarantees probabilistic bounds on the spectral norms of the perturbation 
matrices and D^ K K Our proof builds on the recent results [3T], [55], which give tail bounds on the 

eigenvalues of sums of independent random matrices. 

We have seen earlier that, if v is chosen to satisfy the appropriate bound on the sampling width, then 
F(\ZTpS, TpS\ > 0) — > as K — > 00. We now employ Lemma [i] to show that, if v < c Abound, quad for 
some c < e -1 / 2 , and if s 3 > is suitably upper bounded, then for K > Kbound, we have that \ZTpS, TpS\ 
is bounded from above with high probability. This is stated precisely in Theorem [3] 

Theorem 3. Consider K points randomly sampled in J\f £ (P) such that their projections to TpS are inde- 
pendent and uniform in the region [—v, v] m , i.e., 

Xj ~ U[—u,u] Ltd., i = l,...,K, j = l,...,m. 

Under the notation defined earlier, assume that, for some fixed s\ £ (0, 1) and S2 > e, 



si _ Si 

V \ \ / Abound, quad — \ / n rt t ' 

V s 2 V 3s 2 RL 



(81% - s 2 RLv i )T 



Then, consider that, for some r G (0, 1), 

< s 3 < 

Finally, let < Pi,p 2 ,P3 < 1. Then, if K > Kbound, we have that 

P{\ZTpS,T P S\ < cos- 1 y/(l - r 2 )™) >l-pi- P2 - P3 . 

The proof is presented in Appendix |A.3| In the proof, we use the conditions derived in Lemma [3] in order 
to obtain eigenvalue separation conditions for the correlation matrix constructed with a finite sampling, 
which are then used to derive a bound on || C/2 1] . Note that the error term r is the variance error arising 
due to finite sampling; it goes to zero as K — >• 00. 
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4.3. Analysis of the bounds for quadratic embedding. We now proceed to analyze the dependency 
of the sampling parameters v and K on the manifold dimension m, the maximum curvature |/C TOax | and the 
ambient space dimension n, where we assume that n is high (i.e., n — > oo). We analyze this by considering 
two separate cases based on the structure of the matrix D. 

(1) D is diagonal. We first observe that Abound, quad is independent of the dimension n. In particular, we 
have Abound. quad = O (rn~ 1 \K max | _1 ) . Using this, one obtains from the corresponding expression of S3 
that 

S3 = o(m- b l 2 \K max \- 2 Ty 
For a given probability of success, we derive the sampling bound complexity as follows. 
K^ und = 0(R M logn) = 0(n\ogn), 

<L* = o(!fl°gn)=0(nlogn), 

( 6 n , gg'jound.quadjj \ 
^bound.quad-"-o" ' 3 -, 1 -2, \ 
3 -p logn I = 0(mnT logn). 

Thus Kf, oun( i = 0(mnT~ 2 logn) as n — > oo. Hence, the number of samples has a linear dependence on 
the intrinsic dimension of the manifold and a loglinear dependence on the ambient space dimension. 

(2) D is dense. When no assumption is made on the structure of D, we have Abound. quad = 0(n~ 1 / 2 m~ 1 \K, max \~ 1 ) 
asn-> oo. Using this, one obtains from the corresponding expression of S3 that 

s 3 = o(n- 1 m- 5 / 2 |/C roax |- 2 r) . 

For a given probability of success, we derive the sampling bound complexity as follows. 

Kj > 1 J und = 0(R M logn) = 0(mlogn), 

< 2 L d = o(fflogn)=0(logn), 

^ 6 p 1 ^^bound.quad^ ^ 

Abound, quad-^ T 3 -2 2 1 \ 

2 = logn = U(t m logn). 

V / 

Thus Kbound = 0(T~ 2 m 2 logn) as n — >• oo. Here, the number of samples is seen to depend quadratically 
on the manifold dimension and logarithmically on the ambient space dimension. Note that the depen- 
dency on n is milder in this case in comparison to the case where D is diagonal, which is due to the fact 
that the condition on the sampling width v is stricter when D is dense. 

4.4. Accuracy of tangent space estimation for smooth embeddings. In the previous section, we have 
assumed that N e (P) can be represented with quadratic forms. We now consider the more general scenario 
where the manifold is smoothly embedded in W 1 . In particular, we assume the smoothness class C r , where 
r > 2, in order to be able to study the influence of the local curvature of the manifold. Under this assumption 
we have that /;(•) = f q .i{-) +Ri(-) for I = 1, . . . n — m, where /<,,;(•) denotes the second order approximation 
of /;(•) and Ri(-) denotes the higher-order terms. As each /;(•) is defined over a compact domain, Ri(-) is 
bounded, i.e., = O || ■ ||f for all Z = 1, . . . ,n — to. Hence, 

< Cs,i II • II 1 Z = l,...,n-ro, 
where the constant C s j > depends on the magnitude of the third order derivatives of /; in N e {P). We 
denote 

C s — maxC s j, I = 1, . . . ,n — m. 

Let us again consider the points {-PijiLi to De formed by sampling independently and uniformly at random 
in TpS such that 

x^f 1 ~ U[—v,v]\.\A. Vi = l,...,K and j = l,...,m. 



K (3) = O 

bound 
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Using the same notation as before, when //(•) = f q j(-) + Ri(-), I = 1, • • • n — m, we arrive at the following 
form for the local covariance matrix M^ K \ 



where 



A (K) B (K) 
B (K) T D {K) 



is the covariance matrix considered in the previous section, with the submatrices B^ K ^ and D^ K ^ representing 
the error on account of the manifold's curvature at P. Furthermore, 



o sf y 
b[ k)T d[ k) 



is an additional error term arising on account of the higher-order Taylor series terms of the mappings {//}™ = i™ 
with 



D 



(K) 



^I>^i(£~M) 



X X/i x m R\ ■■■ x XI, Xm ' Rn — m(£ 



K ^1 Jt n-m\sn-T 



W; 



and 



[ lkk ~ I ^{Ri£u? + 2f q ,,(x i )R l (i,, i )) if l = k. 

To begin with, let us define 

8{v) = C s m 3/2 is 3 

where the factor m 3 / 2 appears since || Xi |||< rr?l 2 v 3 for i = 1, . . . , K. We then observe that each entry of 
can be bounded as 

(1) \[b[ K) ]^i \ < u8(u) for j = l,...m; I = l,...,n- m 

(2) II-D^hjcl <<5H 2 + ^)m^|/C mQ:c | for /, = 1, . . . ,n - m 

where we used the fact that < v and |-R;(-)| < 8(y) for obtaining (1); and < \mv 2 \lC max \ for 

obtaining (2). Using the bounds on the entries, we obtain the following bounds on || B[ K ^ \\p and || D[ K ^ \\p 
respectively: 



II B\ K) \\ f < yjm(n - m)v8{v) = y/ m(n - m)C s m 3 l 2 v^ =|| B x \\ F ,bound, 
\\D{ K) \\ F < {n-m){8{v) 2 +8(v)mv 2 \K, max \) 

= (n - m)C s m 5/2 v 5 {C s m 1/2 v + \K max \) =|| D x \\ Fbound . 



Now let us denote 



B 1 = E[B[ K) ], Di = E[D\ K) ], and A = E[A<*> 



Due to the ergodicity of the sampling process, we have B\ — lim/ t -_j. 00 b[ K ' , D\ = liniA'^co d[ K \ and 
A = liniA'^oo A^. Since the bounds on the entries of the perturbation submatrices B[ K ^ and D[ K ^ hold 
for all K, they are also valid for the entries of B\ and D-y. Therefore, we get || B\ \f<\ B\ || F.bound and 

|| L>1 \\ F <|| L»i \\ F ,bound- 

We first consider the case K = oo, where we obtain M — limif_ i . 00 = M q + A. It was shown in 

Lemma [2] that 



Mr, 



v T 

± mxm 







mX (n— m) 



^(n-m)xm ^(n—m)x(n—m) 
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where [£*];, t = E[f q j(x)f q ^k(x)], for I, k = 1, . . . , n — m. Given that M q is now 'perturbed' by A, Lemma[4] 
states the conditions on the sampling width v that guarantee an upper bound on the angle between TpS 
and TpS. 

Lemma 4. Let the sampling width satisfy 

1 

v < 



[3((/3 2 + i?i)+/? 3 a + /3 4 « 2 )] 1 / 2 
where /?2 = AC s m 2 {n — m) 1 / 2 , = 2(n — m)C s m 5 ^ 2 \K. ma x\, Pa = 2(n — m)m 3 C 2 and 

a = min{(3(f3 2 + RL))- 1 / 2 , (3&)- 1 / 3 , (3/3 4 )- 1/4 } . 



Then, as K — > oo, 
where 



(\ZTpS,T P S\ > cos- 1 VTT 



-Bl ||F,&otind 



Tj i?Li/ 4 — 2(|| B 1 \\F,bound + || Di \\pfiound) 

The proof is presented in Appendix |A.4| The main idea here is to ensure that the spectrum associated 
with \l m is separated from the spectrum of the error arising due to the following factors: 

(1) The curvature components {/<j,z}™ = i m which give rise to the correlation matrix D. 

(2) The higher-order Taylor series terms of the smooth mappings {/;}™ =1 m giving rise to the perturbation 
matrix A. 

Observe that, unlike in the case where /;'s are quadratic forms, the deviation \Z.TpS 1 TpS\ now does not 
converge to zero but to a residual bound cos" - 1 -ma^) m . This is on account of the additional error 
associated with the matrix A, which now perturbs the covariance matrix M q . The error term ma 2 ^ can be 
interpreted as the bias error arising due to the nonzero sampling width. In particular, it is easily verifiable 
that 

Coo — > as v — > 0. 

Also note that, had the /;'s been quadratic forms, we would have C s — resulting in CToo = 0. This gives 
us the result obtained in Lemma [2] 

Remark 3. We remark here that the choice of the sampling width v satisfying the condition in Lemma^j\ 
actually ensures the following bound: 

v 2 

— RLv > 4 || Bi || F : bound +2 || Di \\F : bound ■ 

Using this implication in the expression for a^, we obtain the trivial bound o~oo < 1/2. Furthermore, the 
residual angle bound term can be made arbitrarily small by choosing a sufficiently small v . 

We now proceed to the case K < oo. Theorem |4j which is the main sampling theorem of this section, 
states the sufficient conditions on the sampling width v and the number of samples K, such that the deviation 
\ZTpS, TpS\ is suitably upper bounded with high probability. 

Theorem 4. Consider K points randomly sampled in Af e (P) such that their projections to TpS are inde- 
pendent and uniform in the region \—v,v\ m , i.e., 

~ IA[— v, v\ i.i.d., i — \,...,K, and j = 1, . . . ,m. 
Under the notation defined earlier, assume that for some fixed s\ € (0, 1) and s 2 > e, the following holds: 

.[ Si V 

1 olY a i D7^iO i a ?1 I Abound, smooth- 

V 3 Kp2 + S2RL) + p 3 a + /3 4 an/ 
Then, for some r £ (0, 1), let s 3 > be chosen such that 

U 2 ( T 2 \ 1/2 

S3 < [(Sly - S 2 RLV ) - 2(11 Bi \\ F ,bound + II D 1 ||f,6ow)] ( — + °~f J ~ II #1 \\F,bound 
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where 

|| B\ | |f, bound 

U f = 2 ■ 

\ S X% S 2 RLv i ) — 2(|| Bi \\F,bound + || D\ \\F,bound) 

Finally, let < Pi,P2,P3 < 1- Then, if the number of samples satisfies K > Kb ounc i, where Kb ounc i is as 
derived in Lemma [3| the following holds true 

¥(\zfpS,T P S\ < cos" 1 ^(1 - r 2 - ma 2 ) m ) >l- P i-p 2 -P3- 

The proof is presented in Appendix |A.5| and is built on the results of Lemma [4] It uses similar ideas to those 
in the proof of Theorem [3j however, the additional perturbation matrix A also plays a role in the derived 
bounds. Note that the approximation error consists of two terms - the variance term r due to finite sampling 
and the bias term a / arising due to the nonzero sampling width v. 

Remark 4. We again remark here that the choice of the sampling width v in Theorem^ ensures the following 
bound 

v 2 

( S llj S 2 RLv ) > 4 || Bx \\F.bound +2 || F) x \\FMound ■ 

Hence, it follows trivially that a / < 1/2. Furthermore, af can be reduced appropriately by choosing a suitably 



downscaled sampling width. In particular, as shown in Section 4-5 in the worst case &f is 0(n _1/6 ) for 



large n. This implies that the effect of the bias error rnaj on the overall performance is typically mild. 

4.5. Analysis of the bounds for smooth embedding. We now analyze the complexity of the parameters 
involved in the sampling analysis for large n (i.e., n —> oo). We first observe the following for the perturbation 
terms f3 2 , 03 and /?4 in Lemma |4j 

f3 2 = Oin^m 2 ), = 0(nm 5 ^ 2 \IC max \), fa = 0(nm 3 ). (4.3) 
We now proceed by analyzing two different scenarios depending on the structure of the matrix D. 
(1) D is diagonal. It can be verified that the bound on the sampling width has the complexity 

fbound,smooth = 0(((3 3 a)- 1/2 ) = 0{tT ^m' 5 / 6 \K max I" 1/3 ) . 

This is in contrast to the quadratic embedding case, where we have seen that fbound,quad is independent 
of n. Moving on, we have the following complexities for the perturbation bounds || B\ \\F,bound & n d 

j| D I \\F,bound'- 

II B x \\ F ,bound = Oin^m 2 ^^^) = 0{n-^m-^\K, max \-^), 

II £>i \\F,bound = 0{nm 5/2 vl oundsmooth \K, max \) = 0{n- 2/3 m - 5/3 \lC max \- 2/3 ). 

From these orders of dependency, we arrive at the following complexity for the 'residual' angle bound 
term a/. 



°7 



= O 



Bl \\F,bound \ _ 2 2 n _ n/^-l/S™ l/3|r |-2/3a 



bound, smooth 



Observe that 07 decays with the increase in n, which is due to the decrease in Abound, smooth- Notice also 
that o~f gets smaller when the maximum curvature |/C maa; | increases. This can be intuitively explained 
as follows. For higher values of the curvature, the second-order terms in the function expansions get 
more significant compared to the higher-order terms. This causes the surface to be closer to quadratic 
and therefore reduces the bias term 07 that is due to the non-quadraticity of the surface. We now study 
the sampling complexity by analyzing K^ und , Kj^ und and d separately. It can be easily verified 

that 

K bind = 0(R M logn) = 0(n 1 / 3 m 1 /3| X ; roax |4/3 logn)j 

and K blnd = ° (^f logn ) = O(nlogn). 
Furthermore, by observing that 

S3 = Oiv 2 ^^- 1 ' 2 ^ = 0(n- 2 l 3 m-^\K max \- 2 l Z T\ (4.4) 
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we have 



K {3) - O 

^ bound — ^ 



Rb ^bound,smooth S 3 



. .6 D 

bound, smoothie 

2 logn 



V / 

Oin^m^Km^'Hogn). 



Since K bound = max {i^ nd , K ( £ und , K^^j, we have 

K bound = 0{K^J und ) = O(nlogn) as n -> oo. 
Hence, the number of samples has a loglinear dependence on the ambient space dimension. In fact, 

(2) 

although K bound is chosen according to ^ d , this choice of K bound implies that S3 can be chosen up 
to the order S3 = 0(n _1 77i~ 3 / 2 ) by retaining the value of K bo und- Comparing this with the expression 
in (4.4) we see that the variance error term is r = 0(n _1 / 3 m 2 / 3 ). Therefore, as we consider that n is 
large, we obtain 

r 2 + ma} = 0{ma}) = 0(n- 1/3 m 5 / 3 |/C mQx r 4/3 ) 
which means that the error resulting from the finite sampling is negligible compared to the error due to 
the high-order terms in the Taylor expansion of the local manifold approximation. 
(2) D is dense. When the positive semidefinite matrix D is dense, the sampling width has complexity 

Abound, smooth — Oifl ^ Tfl \K-max\ )• 

This is similar to the bound in the quadratic embedding case. Next, we have the following complexities 
for the perturbation bounds || B x || F,bound an d || D x ^f. bound- 

II B x \\ F .bound = O(n 1/2 m 2 ^ oundsmooth ) = 0(?i" 3/2 TO" 2 |/C ma:E |~ 4 ) 

II D, \\ F<bound = 0(nm 5 / 2 ^ oun ^ smooth \IC max \) = Oii^m- 5 ' 2 ^^) 

Finally, we obtain the following complexity for the 'residual' angle bound term er/. 



= O f llF ' b ° und = 0(n^W houn , smooth ) = 0(n-^\IC max n. 




bound, smooth 

We observe that 07 decays at a faster rate compared to the case where D is diagonal. The order of the 
dependency of the sampling width bound on n is higher in this case, which in turn implies a stricter 
bound on the high-order terms in the Taylor expansion. Finally, since the dependency of J^bound.smooth 
on n, m, and |/C ma:r | is the same as that of Abound, quad > we obtain the same sampling complexity as in 
the quadratic embedding case: 



Kbound = 0(t m logn) as 



n —f 00. 



Therefore, the number of samples has a quadratic dependence on the manifold dimension and a loga- 
rithmic dependence on the ambient space dimension. 

5. Experimental Results 

In this section we present experimental results for the empirical validation of the sampling conditions derived 
in the preceding sections. For the sake of brevity, we use the notation 9 = ZTpS, TpS to describe the angle 
between TpS and TpS. Recall from Section that, for any point P lying on a smooth m-dimensional 
manifold S in K n , the points lying in the neighborhood M e (P) of P have the following representation: 

[5 T A (2) ■•• fn-m(x)]; fr-TpS^R. 

In the experiments, we study different manifold embeddings, where the functions /; have the following 
form: 



(1) Quadratic form: fi = \ Y^jLi 

(2) Smooth mapping 1: fi = l — exp ( ~ Ej=i 



J* 3 



(3) Smooth mapping 2: fi = sin ( | 5Zj=i ,j x 
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In particular, we consider the mappings {//}f =1 m to be all of the same form. Furthermore, we focus 
on the general case where {fq,i}i=™ are correlated, or equivalently D is dense, which is the most generic 
scenario. Then, for a given value of IC max , we select the principal curvatures (!Ci t i, . . . , /C;. m ) randomly from 
the interval [0, |/C m ax|] m and then randomly assign the same sign (+ or — ) to the elements of {K,i t j}™ =1 . 

We sample the points as explained in Section |1J We compute the tangent space with these samples points 
and compare the resulting estimation with the true tangent space by measuring the angle between both 
subspaces. Then we analyze the results from the perspective of the theoretical bounds on the width of the 
sampling regions and on the number of samples, which have been derived earlier in the paper. In particular, 
we consider the sampling width to have the value v = 7 Abound. quad, where 



Abound. quad 



/ 60 
ra(n — m)(5m + 4)|/C 



2 • 

max I 



The choice Abound, quad for the reference sampling width is due to the fact that it can be easily computed 
and it also provides a basis for comparing smooth embeddings with quadratic embeddings, made possible by 
varying the scale parameter 7. 

In the first set of experiments, we examine the relation between the estimation error, i.e., the deviation 
\6\, and the sampling density K. We fix to — 5, fC max = 10 and consider different values for the dimension 
of the ambient space, namely n — 100,500,1000. For each value of (rn,n,K max ), we choose the sampling 
width as a scaled version of theoretical bound, i.e., v = 7fbound,quad- We estimate the tangent space with 
K samples and compute the approximation error with respect to the true tangent subspace. The results, 
shown in Fig. [3] have been averaged over 25 random trials for each value of K, where K is varied from 100 
to 2000 in steps of 100. 

We first show in Figures [3apc| the results obtained for a quadratic embedding. We observe that for 
the choice v 1.2 z'bound.quad, 1^1 decreases sharply towards 0° with the increasing values of K. On the 
other hand, the choice v ps 4^b OU nd,quad causes \Q\ to increase towards 90°. The results are similar across 
different values of n. Furthermore, since D is dense and m = 5 in this experiment, Theorem [l] states that 
K = 0(t~~ 2 ) and \9\ — 0(cos _1 (l — r 2 ) 5 / 2 ). Therefore, the order of the dependence of \9\ on K is expected 
as \0\ = 0(cos _1 (l — if -1 ) 5 / 2 ). We can see that the plotted curves are in accordance with this theoretical 
result. We remark that, in these experiments, the true upper bound on v appears to be within a factor 7 of 
Abound, quad, where 7 takes a value between 1.2 and 4. 

Figures |3dfl3rj and |3g|3i| then show the experimental results for non-quadratic embeddings, in particular, 
for the Smooth mapping 1 and Smooth mapping 2 above. Interestingly, the variation of \Q\ with respect to 
K for non-quadratic mappings is almost identical to those for quadratic forms. 

In the second set of experiments, we study the scaling of the true bound on the sampling width v with 
the ambient space dimension n. To this end, we fix to — 5, JC ma x — 10 and the number of samples is fixed at 
the sufficiently large value, i.e., K = 2000. Then, we vary n from 100 to 1000 in steps of 50. For each value 
of n, we first initialize v = 3 Abound, quad and then compute the bound on the sampling width by gradually 
reducing v until \6\ < |6>bound| = 5°. The value of \0\ is averaged over 25 random trials. 

Fig. [4] shows the variation of v with n. Importantly, we have observed that for quadratic forms (Fig. 4a I, 
the true bound v on the sampling width compares to its theoretical estimation Abound. quad as v — 7 Abound, quad, 
where 7 lies approximately between 1.38 and 1.46. This is moreover seen to be true for smooth mappings 
(Figures 4b 4c) indicating that the true bound on v is approximately OirT 1 ! 2 ). This also matches the 
theoretical results of Section 14.51 

The behavior of the maximum sampling width in Fig. [4] is similar for quadratic and non-quadratic map- 
pings. Therefore, it seems possible to use the quadratic sampling width bound Abound, quad for the non- 
quadratic mappings considered in this experiment. However, this is not necessarily true in general as it 
depends on the higher-order characteristics of the non-quadratic mapping. 

In the next experiment, we are interested in observing the dependency of the true sampling width bound v 
on the maximum local curvature JC ma x- We fix to = 5, set K — 2000 and choose a fixed n £ {100, 500, 1000}. 
We vary K. max from 0.5 to 10 in steps of 0.5. For each value of IC max , we first initialize v = 3 inbound, quad and 
then compute the bound on v by gradually reducing v until \Q\ < | Abound | = 5°. The value of \9\ is averaged 
over 25 random trials. 
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FIGURE 3. Variation of the deviation \6\ with respect to K for different sampling widths v. For 
each type of mapping, v = jv bound , qnad . 
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FIGURE 4. Maximum sampling width v for which the deviation \Q\ < \9 bound \ = 5° is achieved for 
different values of n. 
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Fig. [5] shows the dependency of v on JC max for different values of n. Similarly to the previous experiments, 
we observe that for quadratic forms, v — 7 Abound, quad, where 7 is approximately between 1.32 and 1.46 
(see Fig. 5a). We note the same behavior for the case of smooth mappings shown in Figures 5b and 5c 
The results indicate that for fixed values of to and n, the true bound on v matches the theoretical result 
_1 ) derived in Section 



OQIC r 



4.5 




(a) Quadratic form (b) Smooth mapping 1 (c) Smooth mapping 2 

FIGURE 5. Maximum sampling width v for which \Q\ < |#bound| = 5° is achieved for different 
values of K. m ax- The results are given for different dimensions of the ambient space n. 



Then, we investigate the relation between the sampling density K and the embedding dimension n for a 
fixed sampling width v. We choose lC max = 10 and select several values for the dimension of the manifold 
to e {5, 10, 15}. We vary n from 100 to 2000 in steps of 100. We set v = Abound. quad, where Abound, quad is 
evaluated at the largest value of n such that the fixed sampling width v is sufficiently small for the range 
of n under consideration. We denote the largest value of n by rti argo in this experiment. For each value of 
(to, n, fC max ), we compute the minimum number of samples needed in order to have \6\ < |#bound| = 5°. The 
value of 16*1 is the average of 25 random trials. 




FIGURE 6. Minimum number of samples K for which \Q\ < | Abound | = 5° is achieved as n is varied. 
The results obtained for different dimensions of the manifold m. 



Fig.[6]shows the variation of K with respect to n for the different mappings. We see that K increases with 
the ambient dimension n as expected. Furthermore, for a given n, we observe that increasing the dimension 
of the manifold increases K. We now show that this behaviour is well explained by our theoretical results 
in Section 4 In order to see this, we first note that v = 0(n^.^m~ 1 ]C^ l l ax ), which is due to the relation 
v = 0{n~ 1 '^ mT 1 JC^ ax ) derived in Section 4.3 and the fact that we evaluate v at n = n\ avgc . Using this value 



of v in the bounds on the sampling density stated in Lemma [3j one can easily verify that 



<o ) u„d = o((m+-^)logn), < ) und = 0(logn) and 

' '-largo , 



K {3) - O 

"bound ^ 



mil 



"la 



log «) « O (royVrwlogn) 
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Since n < rti argc , we obtain inbound = O yy 71 + — J l°g n J ■ This closely matches the behavior shown in 
Figures [6aT|6c] 

Finally, in the last experiment, we would like to look into the dependency of the sampling density K 
on the curvature term JC max for a fixed sampling width v. We set m = 5 and pick several values of the 
ambient space dimension, i.e., n e {100,500, 1000}. We vary JC max from 0.5 to 10 in steps of 0.5. We fix 
the value of the sampling width as v = Abound, quad, where Abound, quad is evaluated at the largest value of 
K-max (denoted as /C max , large m this experiment). For each value of (m, n, K. max ), we compute the minimum 
number of samples required in order to have \9\ < |#bound| = 5°. The value of \6\ is averaged over 25 random 
trials. 
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Fig. [7] shows the relation between K and |/C max | for the different mappings. We see that K increases 
with |/C mQX | as expected. Interestingly, we see that, for a fixed value of lC max , a change in the embedding 
dimension n does not significantly affect K . We now show that such a variation of K with |^ max | for a fixed 
sampling width is explained by the theoretical results in Section [4] We first note th at th e sampling width 



is v = 0(n 1 / 2 to 1 |/C max ,i arg c| 1 ), which can be obtained from the results of Section 4.3 by evaluating the 
bounds on the sampling density at JC max = /C maXi i argo . From Lemma [3j one can then easily verify that 

*£id = o((m+ f™* ) logn) , K^ und = 0(logn) and 

V V max, large / / 

Abound = (( m/C ma^max,largc + w} /2 \JC max \ |£ m a X ,largc |" 1 ) lo S ") ~ ( m \^max \ / \JC maxMgc \ logtl) . 

As \JC max \ < |/C max ,iarge|, we have inbound = O (Jm+ K ^ ma ^ j logn^. Thus, for a fixed n, the bound 
on K increases quadratically with |/C moa; |, which is consistent with the curves presented in Figures [7a|7c] 



Furthermore, inbound depends only logarithmically on n, suggesting that a change in n would affect the 
sampling density only mildly: this also matches the experimental results. 



6. Discussion 

In this section, we first discuss our results in view of the recent works from the literature. Then we show 
how our results could be used in practical applications. 

We first position our study relatively to the works presented in [TH] and [5U], which are, to the best of 
our knowledge, the closest to our paper. In |18j the authors consider a global sampling from a compact 
manifold and relate the size of the neighborhood e to the number of samples K through the condition 
s = 0(K~ m + 2 ). From this aspect, our approach is significantly different. Our bound on e is derived in the 
asymptotic limit where K —¥ oo, so that it depends completely on the local manifold geometry. Furthermore, 
the analysis in [TH] gives soft bounds that do not reflect the effect of the curvature, nor of the ambient space 
and manifold dimensions on the sampling conditions. Meanwhile, we derive worst-case bounds on both e 
and K by explicitly taking into account the effect of curvature and dimensions. 
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The work in [5D] is parallel to ours and addresses a similar problem. The analysis is however clearly 
different in two main aspects. Firstly, the analysis in [30] assumes that the manifold is embedded with 
exactly quadratic forms and that the data consists of samples from the quadratic manifold corrupted with 
Gaussian noise. On the contrary, the type of the manifolds that we consider is more generic as we assume 
an embedding of the manifold with arbitrary smooth functions. In particular, we explicitly examine the 
effect of the deviation of the manifold from its second-order approximation on the accuracy of the tangent 
space estimation. Secondly, an important difference between both studies is that the data is already sampled 
in [20] , where the problem consists of choosing the size of the subset of samples used in the tangent space 
estimation, while we assume that we have a rather direct control on the parameters of the local random 
sampling (sampling width and number of samples). Therefore, in [5D], the number of samples N (which is K 
in our notation) and the sampling radius r (which is comparable to the sampling width v in our notation) are 
directly dependent on each other. As the sampling is formulated as a subset selection problem, increasing the 
number of samples necessarily leads to choosing samples from a larger radius. The analysis is based on the 
assumption r = cN 1 ^, where c is a constant and d is the dimension of the manifold (m with our notation); 
therefore, r and N can be represented in terms of a single parameter. Meanwhile, in our analysis, we consider 
a setting where we treat the sampling width v and number of samples K as two different parameters. 

Even if the frameworks in [20] and in this paper are quite different, we can try to compare results. It is 
assumed in [20] that the subset of samples selected for tangent space estimation corresponds to a sampling 
radius smaller than a threshold r max , where r max is the largest radius within which the manifold can be 
accurately represented with quadratic forms. We give a characterization of such a bound on the sampling 
width in Lemma[4]for arbitrary smooth manifolds, which is very relevant to the parameter r max in their work. 
In [20] , the parameter r max is used as a predetermined constant and the study does not go into the analysis 
of r max for non-quadratic manifolds. A direct comparison of the main results in both papers is difficult. 
However, we can compare the noiseless version of the Interpretable Main Result 1 in [20] and our results on 
quadratic manifolds in the following way. The denominator of the angle bound in Interpretable Main Result 
1 quantifies the separation between the tangential and normal components of the computed eigenspace. 
Furthermore, the sampling radius must be small enough to guarantee that the eigenvalues corresponding to 
the tangential components must be larger than those corresponding to the normal components. Then, an 
admissible sampling radius must be below the value of r that equates the denominator of the expression 
in Interpretable Main Result 1 to zero. Taking the noise variance as zero and observing the relation K = 
0{n 1 / 2 m\K max \), where K is the curvature parameter in [20], their result translates into the fact that the 
admissible sampling radius must be smaller than 0{nr 1 ^ 2 m^ 1 / 2 \]C max \^ 1 ) with our notation, where m, n 
and |^ maa; | are the parameters corresponding respectively to the intrinsic manifold dimension, the ambient 
space dimension and the curvature. This is in agreement with our result for quadratic embeddings (see Table 
[lj, where we have calculated the admissible sampling width as 0(n _1/,2 m _1 |/C max | _1 ). 

Now that our work has been properly positioned with respect to the related work, we discuss the usage 
of our results in practical applications. We can interpret our results in two important application areas, 
namely (i) the discretization of a manifold with a known parametric model - manifold sampling and (ii) the 
recovery of the tangent space of a manifold from a given set of data samples - manifold learning. 

First, in order to use our results in a real application, the intrinsic dimension m of the manifold, the 
curvature parameter JC max , and the higher-order deviation term C s have to be known or estimated. In a 
manifold sampling application, m is already known and it is possible to estimate K, max in the following 
ways. If the manifold conforms to a known analytic model, it is easy to compute the values of the principal 
curvatures and the higher-order terms from the Taylor expansion of the model. If an analytic model is 
not known for the manifold, the curvature of a manifold of known parameterization can be estimated using 
results from Riemannian geometry such as [26] (Section V) and [27] (Proposition 2). The results in Section 

V of j26j are especially compatible with our definition of curvature, where we define K. max as the largest of 
the maximum principal curvatures of the hypersurfaces Si, I — 1,... ,n — m, each of which have a single 
normal direction. Although the work in [20] addresses an image registration problem, the analysis in Section 

V of 25] is generic and it describes a procedure to compute the maximum principal curvature of a manifold 
corresponding to a single normal direction, which is equal to the norm of the second fundamental form 
corresponding to the normal direction. Applying this procedure for all n — m normal directions and taking 
the largest one of the maximum principal curvatures, one can compute the exact value of K, rnax . Then, the 
deviation term C s is the maximum of the constants C Sj ;. Once the maximum principal curvature of Si is 
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computed as above, one can find a suitable bound for C Sj ; by looking at the deviation of Si from its second 
order approximation. 

Second, in a manifold learning application where only data samples are available, to, K, max and C s are 
unknown and need to be estimated. The estimation of the intrinsic dimension of a data set has been studied 
in several works such as [55] , [H] and [3U]. It is also possible to obtain an estimate of the curvature from 
data samples using results such as in [31] . In |31j . a method is proposed to estimate the intrinsic dimension 
of the manifold by examining the variation of the singular values of the data covariance matrix with respect 
to the radius of the neighborhood of samples used. It is observed that the singular values corresponding to 
the curvatures can be distinguished from the singular values corresponding to the tangential components by 
using the fact that the tangential and curvature singular values conform respectively to linear and quadratic 
fits as a function of the radius. In such a setting, the deviation of the curvature singular values from their 
quadratic fits for large values of the radius can possibly be related to the deviation term C s . 

Finally, in our results, we characterize the admissible sampling width for accurate tangent space esti- 
mation in terms of the tangent space distances, i.e., the distances between the projections of points on 
the tangent space and P. In a manifold sampling application, our analysis can be easily adapted to the 
parametric data model at hand since it assumes that the true tangent space of the manifold is aligned with 
the subspace generated by the first to canonical basis vectors. This can be achieved by applying a Gram- 
Schmidt orthonormalization to the tangent vectors of the data manifold and then performing a change of 
coordinates in R n such that the subspace spanned by the original tangent vectors is mapped to the subspace 
generated by the first to canonical basis vectors. Meanwhile, in a manifold learning application where only 
data samples are available, one needs to adapt the bounds on the tangent space distance to bounds on the 
distance between actual data samples in the ambient space. This can be done in different ways. Based on 
our results, one can easily obtain some worst-case bounds on the ambient space distance by making use of 
the fact that the tangent space distance is upper bounded by the ambient space distance. This approach is 
expected to be effective if the ambient space dimension n is comparable to the intrinsic dimension to, or if 
the manifold has small curvature. Alternatively, if n S> m and the manifold has significant nonlinearity, the 
current results involving the tangent space distance can be translated into approximate conditions on the 
ambient space distance with the help of the estimation \\.\\ ambient space ~ 0(\\ -Wtangent space y/n/m). 

7. Concluding Remarks 

We have presented a theoretical analysis of the tangent space estimation at a point on a submanifold from 
a set of manifold samples that are selected locally at random. We have considered a setting where the 
manifold is embedded smoothly in R™ and the tangent space is estimated with local PCA. We have derived 
relations between the accuracy of the tangent space estimation and the sampling conditions. In particular, 
we have examined the effect of the local curvature of the manifold in tangent space estimation and shown that 
the size of the sampling neighborhood shall be inversely proportional to the manifold curvature. We have 
also seen that sampling conditions are affected by the correlation between the components of the second- 
order approximation of the embedding. The sampling width can be chosen larger when the components 
of the manifold in different dimensions are less correlated. The presented study can be used for obtaining 
performance guarantees in the discretization of parametrizable data and in manifold learning applications. 
Finally, our analysis assumes that the data samples are noiseless, i.e., the data lies exactly on the manifold. 
A future research direction resides therefore in the extension of the current results to a scenario where data 
samples are corrupted with noise. 
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Appendix A. tti-dimensional smooth manifolds in M. n 
A.l. Proof of Lemma [2l 

Proof. Observe that each entry of M^ K ^ is the sum of K i.i.d. random variables. Therefore, by the Strong 
Law of Large Numbers as K — !• oo, [M^ji^ converges a.s. to [Mjjj for all 1 < i,j < n, where each entry 
of M is the expected value of the random variable involved in the summation of the corresponding entry of 



TANGENT SPACE ESTIMATION FOR SMOOTH MANIFOLDS 



25 



AfW. Let 



M = 



A B 
B T D 



Consider the entries of A. We have for j, k = 1, . . . , m, 



f if j ^ k 

[A] hk = E[ Xj x k } = | ^2 . f ^. = fc 



Consider the entries of i3. We have for j = 1, . . . , m and I = 1, . . . , n — m, 

= E[^/ ; (.f )] = Efo- ^ < x, % k > 2 /Cj, fc ] 
fc=i 
1 m 

= -E[xj y ^2,{x\vi t k,i H h a;iuj,fc,m) 2 ^i,fc] = 0. 



fe=i 



The above result follows as each term in the expansion of Xjfi(x) has at least one odd power of Xj, and the 
expected value of each term is thus 0. Now, consider the diagonal entries of D. We have 



1. 



[D} ltl = E[f?(x)] = -E 



<^|/C m(M | 2 (E[||S|| 4 ]). 



Furthermore, 



E[||i|| 4 ]=E[]>> 4 + 2£* 2 * 2 ] 



h2— v ; ' 

5 2 



m(5m + 4)^ 4 

' 



Hence 



< [D] u < m(5 ^ + 4) ^ |/C ma ,| 2 (/ = l,...,n-m). 
We have the following bounds for I, k = 1, . . . , n — m, Z ^ fc on the off-diagonal entries of I? 

P]i,k=E[/i(£)/fc(aO] 

^ m m 



Similarly, it holds that 



Hence, M has the form 



where 



< ^maxl E[|| X || 2 j = — |/C ma x| ■ 



m(5m + 4)i/ 4 2 

l^Ji,fc > JgQ l^-mazl ■ 



M 



i^ 2 r n 

^ 1 mxm u mx(n-m) 

0(n— m)xm ^(n—m)x(n—m) 



< [£>]z,i < 



m(5m + 4)v A 



180 



m(5m + 4)i/ 4 



180 

Therefore, for I, k = l,...,n — m, 



i 2 <[£k* < m(5 "; 8 + 4)iy V ^i 2 



i m(5m + 4)^ 4 2 

ll-L'Ji.fcl < ^ l^moxl — Abound- 
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2 

Observe that the eigenspace of M corresponding to the eigenvalue is equal to the span of {§1, . . . , e m }, 
which is the same as TpS. Hence, as K — > 00, we obtain the implication 

y > p(D) => |Zfp5,T P 5|^0, 

where p(D) denotes the spectral radius of D, which is positive definite. In the case where D is diagonal, we 
have 

P {D) < [D]bound = 1 10 » ' \Kn - 



180 

Therefore, for this case, any value of v satisfying 



v 2 m(bm + 4)v A _ l2 , , / 60 

> 757; I^Cmoxl or equivalently v < 



3 180 1 max| * J y m(5m + 4)|/C ma:c | 2 

ensures that |ZTp5*, TpS 1 ] — > as K —¥ 00. In the scenario where D is dense, we have the stricter condition 

P(-P) <|| ^ ||f< (n-m) 1 1on 7 \K r . ~ 



--rii ax 



180 

Thus, for this case, any value of v satisfying 

v 2 m(5m + 4)zA 2 . / 60 

> (n — m) |A^ mQ2; | or equivalently < 



3 180 '' y m(n — m)(5m + 4:)\K. max \ 2 

ensures that \ZT P S, T P S\ -> as K -> 00. □ 

A. 2. Proof of Lemma [3} We first recall two recent results on the tail bounds for the eigenvalues of sums 
of independent random matrices. The first result concerns upper and lower tail bounds on all eigenvalues of 
a sum of independent positive semidefinite matrices as stated in Theorem 4.1 in [21 . 

Theorem 5 (Eigenvalue Chernoff Bounds). Consider a finite sequence {X{\ of independent random positive 
semidefinite matrices where Xi S K™ x " with || Xi ||< R a.s. Given an integer k <n define 

Mk = A fc ($^E[X,]) 
3 

Then 



P(A fc (^X,0 >tii k ) < (n-fc+1) 

3 

-(i-=) 2 M fe 



where t > e and 



\*E-Xj-)<*^*)^*e «• \ s 6(0,1). 

The second result concerns an upper tail bound on the operator norm of a sum of zero-mean independent 
random matrices which can moreover be rectangular. This result is stated in the form of Theorem 1.3 in 

Theorem 6 (Matrix Bernstein: Rectangular Case). Consider a finite sequence {Zj} of independent random 
matrices, Zj G R dlXd2 . Assume that each random matrix satisfies 

E[Zj] = and || Z 3 ||< R a.s. 

Define 

a 2 := max J || ^E[Z k Z* k ] ||,|| ]TE[Z*Z fe ] ||] . 

{ k k ) 

Then for all t > 0, 

P(\\J2Zk\\>t)<(d 1+ d 2 )e Xp (-f^y 



k 

We now proceed to prove the Lemma |A.2| 
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Proof. We have Af<*> = £ Y,f=iPiPi , where ft = [xf /i(xi) . . . J„- m (^)] T e K n . Now, 



K 

II ^ftpf ||< ^ lift |||< ^(m^ 2 + J(n-m)mV|/C ma;E | 2 ) 

1 ? 

= — v R M a.s., 

K 

where Rm = m+ \{n — m)rn 2 v 2 \JC max \ 2 . Here we used the fact that 

\fi{x)\ <)^rnv 2 \K, max \ for x £ [-v, z/] m and i = 1, . . . , n - m. 
Furthermore, since v < fbound.quad, 

= Aj ^XJftpf J = ^ 
Hence, by applying Theorem [5j we have the following for si € (0, 1 

( (M (K) ) < sii/ 2 /3) < (n - m + 1) exp | 

= (n — m + 1) exp 



J = 1, ... TO. 



2u 2 R M /K 
-{l- Sl ) 2 K 



6R 



M 



Then, we have, £> w = i J2?=i lilT where % = [fi(xi) . . . f n - m (xi)] T £ R" m - Furthermore, 



1 - -T 11/ 1 II - ||2^ RdV A 

||< ^ II 9, || 2 < 



where Rd — \{n — rn)m 2 \]C max \ 2 . Applying Theorem [H] for p(D^) = Xi(D^), we can write 

P(p(D {K) ) > s 2 p{D)) < (n - to) 



s 2 p(D)K 



s 2 > e. 



We have seen in Section A.l that p(-D) < RLv . Using this, we obtain the following tail bound: 



, S2 > e. 



(A.l) 



P(p(D (K) ) > s 2 RLv i ) < (n - to) 
We proceed now to derive an upper bound on || || by applying Theorem [Hj First, observe that 

By using the bounds 



(A.2) 



we obtain 



Xi || 2 < vy/m and || q l || 2 < -mv 2 y/n - m\JC max \, 



1 _ m .. 1 1. _ I. I. _ .I Rb v 

K X iQi 11^ ^ II ^ Hall % lla< — ^- 



where i?e = ^m 3/,2 (n — ?n) 1 / 2 |/C maa ;|. Furthermore, we have the following form for the parameter a 2 as 
defined in Theorem [6] 



r 1 k x jc 

a 2 :=max ^ || ft || 2 || Efoajf] H,^^ 



i 1121 



Observe that, for i = 1, . . . , K, we have 



and || E[^gf ] ||= p(-D) < RLv 4 . 
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Furthermore, using the aforementioned upper bounds on || qi \\ 2 and || Xi \\ 2 , we arrive at the following: 

2 f (n - m)mV 6 |/C mox | 2 mv 2 

a < max < , p(L)} 

I 12K ' K HK ' 



< max 

where 



(n - m)m 2 v & \K max \ 2 mRLv 6 1 _ v^R G 
Y2K ' K I K 1 



m 2 \K. max \ 2 / R(5m + 4) 

R a := max < n — to, 



12 L 15 

Employing the bounds on || j^Xiqf || and a 2 in Theorem |6j we obtain the following tail bound. 

P(|| BW ||> S3 ) < n exp ^ -^j/^ -^ , S3>0 (A.3) 

Lastly, let < Pi,p 2 ,P3 < 1 denote the upper bounds on the probabilities of the events 

{\ m {M^) < s lV 2 /i) , {p(D™) > s 2 RL^} , {|| ||> ,s 3 } , 

respectively. This is clearly achieved by choosing 

> max ^ (2) ft- (3) 1 - Ku , 

j\ j> max <^j\ bound , J\ bound , J\ bound J — Abound, 

where K^ und , K^ und , K^ und are as defined in the statement of Lemma [3] Applying the union bound, we 
arrive at the stated result. □ 

A.3. Proof of Theorem [3j 

Proof. We start with the following identity for i — 1, . . . , to 

M^Ui = K(M^)u u (A.4) 

where 

r a(k) p>(k)i 
M - [bW t DW. 

Here Ai(AfW) > A 2 (A/ (Ar) ) > ■ • • > A„(AfW) denote the eigenvalues of A/( K ) and = [uf A u£ 2 ] T denote 
its corresponding eigenvectors. Using Eq. (|A.4[), we obtain the following inequality. 



=> (\ m (M {K) ) - p{D {K) )) || u i>2 \\ 2 <\\ flW || . (A.5) 
Now, provided that K is chosen such that K > K\y Qun ^ the following events hold with high probability. 

A m (M^) >*iy}, < s 2 i?L^} , {|| B^ ||< s 3 } , (A.6) 

where s\ £ (0, 1),S2 > e and s 3 > 0. From (A.6 1 and (A.5), we conclude that the following inequality holds 
with high probability. 

(siy - s 2 RLv ) || Ui, 2 |b< s 3 - 
The L.H.S. of the above inequality is positive if v < y // si/(3s 2 RL). Assuming that this is satisfied, we obtain 

II II < — wr^ = a » 

si ^ — s 2 RLu^ 
II ^2 ||f < \fma s . 
Furthermore, we have from Lemma [T] that 

|| U 2 \\f< t < 1 |ZT P S,T P S| < cos-^v/ll-r 2 )™). (A.7) 
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Lastly, we see that Eq. (A. 7 1 is ensured if the following holds 

y/ms 3 



\fma a < t <^> 



81 1 



s 2 RLv i 



<T«S 3 < 



( Si \-s 2 RL^)t 



Therefore, for these choices of v and the constants s%, s 2l S3, we get the bound on \Z.TpS,TpS\ stated in 
the theorem. 

□ 

A. 4. Proof of Lemma [4j 

Proof. For K = 00, we have M = M q + A, where 



M„ 



3 J " 




D 



and 



A 



< 2 || B 1 



< 2yJ m(n — m)v8(v) + (n — m)(8(v) 2 + 5(is)mis 2 \JC max \), 

= 2C 8 m^ 2 y/m(n-m)v 4 + (n - m)(C 2 s m 3 v 6 + C s ra 5/ V|/C max |), 

= 2 -Bl \\F,bound + \\ D\ \\ F, bound— \\ A \\F,bound ■ 



(A. 



Now, if there is no perturbation on M q , the eigenvectors {ei, . . . , e m } corresponding to ^- span TpS. As M q 
is actually perturbed by A, we analyze the perturbation of the space formed by the span of {ei, . . . , e m }. 
We first observe from Weyl's inequality [35] the following bounds on the eigenvalues {Aj(M)} i=1 of M: 



Aj(M) G [Xi(M q )- || A \\ F ,bound, K{M q )+ || A WpMound] , i = 1, . . . ,n. 



Here, Xi(M q ) 



for i = 1, 



. , m. Furthermore, {Ai(M 9 )}™ =m+1 are the eigenvalues of D. In order to 



analyze the perturbation on spanjei, . . . , e m }, we would like to guarantee the 'separation' of {Aj(M)}™_ 1 
from {Aj(M)}" =m , v Denoting p(D) to be the spectral radius of D, we have the following sufficient condition 
to guarantee this separation. 

,,2 



- II A 



F,boun 



d > p(D)+ || A || F , 



bound 3 



p{D) > 2 || A 



(A.9) 



Now, as shown in Section 



A.l 



p(D) < RLu\ where L = m ( 5m +W™*\ ^ and 



R 



1 ; if D is diagonal 
(n — m) ; if D is dense. 



Using this fact along with Eq. (A. 8 1 in Eq. (A.9), we arrive at the following sufficient condition that 
guarantees the separation of eigenvalues: 



— - RLu 4 > (3 2 ^ + /3 3 ^ 5 + /3 4 ^ 6 
(fit + RL)u 2 + (3 3 v 3 + pas 4 < 



1 



(A.10) 
(A.ll) 



where (3 2 = 4C s m 3 ^ 2 sj m(n — to), /? 3 = 2(n — m)C s m b l 2 \K max \ and (3^ = 2(n — m)C 2 m 3 . Now, clearly the 
solution to Eq. (A. 10) needs to satisfy the following conditions. 

(ft + RL)v 2 < 1/3 & v < (3(/3 2 + RL))- 1 ' 2 , 

fois 3 < l/3&v< (3/3 3 )" 1/3 , 

/3 4 ^ 4 < ^ ^ < (3/3 4 )- 1/4 . 



:S0 
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Equivalently, the solution to Eq. (A. 10) satisfies v < a, where 



a = mm 



{(3(ft + RL))- 1 ' 2 , (3ft)- 1 / 3 , (3ft)- 1 / 4 } 



We thus arrive at the following sufficient condition on v in order to guarantee Eq. ( A. 11 1 and consequently 

1 



Eq. (A.9I: 



i/((ft + RL) + /3 3 a + /3 4 a z ) < -, 



v < 



1 

3[(ft + RL) + fta + fta 2 



(A.12) 



We now proceed to bound the angle between TpS and TpS by using the identity Mui = \i(M)ui, Vi 
1, . . . , m. We obtain 

B\u iA + (D + D x )u it2 = Xi(M)u i<2 . 
By taking the ^-norm of both sides and using the fact that || Dui 2 ||2< p{D) \\ Ui. 2 \\ 2 , we obtain 

(Xi(M) - p(D)- || D 1 \\ F ) || u i<2 \\ 2 <\\ B 1 \\ F . 



Now, \{M) > || A \\p for i = l,...,m. Therefore, if v is chosen to satisfy Eq. (A.12), then the 

following holds true. 

K(M) - p(D)- || D 1 \\ F > V -- || A || F -p{D)- || D 1 \\ F , 

> "g RLv — 2(| £?i Unbound + || D\ \\ F .bound), 

> 0. 

Using the above facts, we obtain the following upper bound on || Ui 2 lb- 

II - 11 „ II \\F,bound 

II u i 2 1 1 2 < ~ 2 = '''oo ■ 

^ i?L^ 4 — 2(|| Bi \\ F ,bound + || £>1 \\F,bound) 

Finally, to conclude the proof, we obtain the bound on \ZTpS,TpS\ by using Lemmajl] 



(A.13) 



II u 2 HI 



II 111 < mcr^, 



i=l 
2 \m 



=>cas i (Z.TpS,T P S) > (l-\\U 2 \\ F ) m > (1 



2 \m 



□ 



A.5. Proof of Theorem H 

Proof. The proof follows along the lines of the proof of Theorem [3j We start with the following identity for 
i = l,...,m: 

M^Ui = Xi(M^)ui. (A.14) 

We have, = M^ K) + A< A ') where 



MW = 



and A(*> = 







Let Ai(Af^) > A 2 (AfW) • • • > \ n {M ^) d enote the eigenvalues of AfW and Uj = [u^ u T i 2 \ T denote its 
corresponding eigenvectors. Using Eq. ( |A.14 ), we obtain the following: 

{B (Kf + B ^ T )u itl + (D<*> + D^)u ij2 = Ai^K, 

=> (A ro (M W) - p(I? (K) )- || Di ||f,6ow) II u<,2 || 2 (A.15) 

< || B^ K > || + || _Bx \\ F> bound ■ 

We observe by Weyl's inequality [35] that the following holds true: 

Am(MW) > ^(Mf))- || A || F , bound = A m (A/W) - 2 || Sj || Fj6ourid - || D x \\ Fibound . (A. 16) 
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If K is chosen such that K > Ktound, the following events hold with high probability: 

L m (MW)> Sl ~y (p(DW) < s 2 RLv 4 } , {||BW||<* 3 }, (A.17) 

where si € (0, 1), s 2 > e and S3 > 0. Thus, using Eq. (A.16I and Eq. (A.17) in Eq. (A. 15), we obtain the 
following: 

v 2 

(Sl-^j S 2 RLv — 2(|| Bi \\F,bound + || D\ \\F.bound)) | Mj,2 ||2< S3+ || B\ \\F,bound ■ 

Similarly to the proof of Lemma [4j one can show that the following condition is sufficient to ensure that the 
L.H.S. of the above inequality is strictly positive 

si 



v 2 < 



(A.18) 



3[(/3 2 + s 2 RL) + (3 3 a + /3 4 a 2 }' 
In particular, the above condition ensures the following: 
v 2 

Sl— S 2 RLv > 2 || A \\F,bound= 4 II B\ \\F,bound +2 || D\ \\F,bound ■ 

Now, assuming that v satisfies Eq. (A.18), we arrive at the following bound on || u i2 || 2 for i = 1, . . .,m: 

s 3 + || Bl \\F,bound 



Ui 2 2 < 



( Sl f - S 2 RL^) - 2(11 B l \\ F ,bound + || D 1 \\ F ,bound) 



The above bound on || Ui t2 \\ 2 implies that || U 2 \\ 2 F < ma 2 . Let 

|| Bi \\F,bound 

If for some r G (0, 1) 
then from Lemma [T] we obtain 



{ s l% S 2 RLv 4 ) — 2(|| Bi \\F,bound + || D\ || F,bound) 

2,2, 2 i. ^ f 2 I , 2\l/2 

mcr s < r + mul <^> cr s < (t /m + <7y) ' , 
cos 2 (ZT P S, TpS) > 1 - r 2 - ma 2 ,. 



Finally, we see that Eq. (A. 19) is ensured if the following holds. 

s 3 + || Bi || F, bound 



< 



{Si\ - S 2 RLlS 4 ) - 2(|| Bi \\ FM und + || Di \\ F ,bound) \ m 



1/2 



s 3 < [( s i^r ~ s 2 RLv 4 ) - 2(|| Bi \\ F ,bound+ II -Di ||f,6ow)] 



1/2 



Bi 



F.bound 



This completes the proof. 



(A.19) 



□ 



